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Abstract 

This paper concerns the efficient implementation of quantum circuits for qudits. We show that controlled 
two-qudit gates can be implemented without ancillas and prove that the gate library containing arbitrary local 
unitaries and one two-qudit gate, CINC, is exact-universal. A recent paper [S.Bullock, D. O'Leary, and GK. 
Brennen, Phys. Rev. Lett. 94, 230502 (2005)] describes quantum circuits for qudits which require 0(d") two- 
qudit gates for state synthesis and 0(d 2n ) two-qudit gates for unitary synthesis, matching the respective lower 
bound complexities. In this work, we present the state synthesis circuit in much greater detail and prove that 
it is correct. Also, the \{n — 2)/(d — 2)] ancillas required in the original algorithm may be removed without 
changing the asymptotics. Further, we present a new algorithm for unitary synthesis, inspired by the QR matrix 
decomposition, which is also asymptotically optimal. 

1 Introduction 

A qudit is a d-level generalization of a qubit, i.e. the one-qudit Hilbert space splits orthogonally as 

M(l,d) = C{|0)}©C{|1)}©---©C{|</-1)} (1) 

while the n-qudit state-space is ?{(n,d) = [5f(l,d)]® n . Thus for N = d'\ closed-system evolutions of n qudits are 
modeled by N x N unitary matrices. Qudit circuit diagrams then factor such unitaries into two-qudit operations 
Ijn-2 ® V where V is a d 2 x d 2 unitary matrix, or more generally into similarity transforms of such gates by 
particle-swaps. The algorithmic complexity of an evolution may then be thought of as the number of two-qudit 
gates required to build it. A degree of freedom argument (5) leads one to guess that exponentially many gates are 
required for most unitary evolutions, since the space of al\N xN unitary matrices is ^"-dimensional. Indeed, this 
space of evolutions is a manifold so the argument may be made rigorous using smooth topology, and thus £l(d ) 
gates are required for exact-universality. Yet until quite recently the best qudit circuits contained 0(n 2 d 2 ") gates 
(lfj. In contrast, (9(4") gates were known to suffice for qubits (d = 2) 1141 . presenting the possibility that qudits 
are genuinely less efficient for d not a power of two. 

Quite recently, an explicit 0(d 2n ) construction was achieved |4). It uses the spectral decomposition of the 
unitary matrix desired and also a new state synthesis circuit (6||9j[n]|2|. Given a |\|/) € H (n,d), a state-synthesis 
circuit for |\|/) realizes some unitary U such that U |\|/) = |0). There are 2d" — 2 real degrees of freedom in a 
normalized state ket \\\t), which may be used to prove that circuits for generic states cost £l(d") two-qudit gates. 
This is in sharp contrast to the case of classical logic, where 0{n) inverters may produce any bit-string. The most 
recent qudit state-synthesis circuit |4| contains (d n — l)/(d— 1) two-qudit gates, and in fact each is a singly- 
controlled one-qudit operator Ai (V) = I^-d © ^ ' ■ 
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There are two ways to employ an asymptotically optimal state synthesis circuit in order to obtain asymptoti- 
cally optimal unitary circuits. The first is to exploit the spectral decomposition, which involves a three part circuit 
for each eigenstate of the unitary: building an eigenstate (9||4), applying a conditional phase to one logical basis 
ket, and unbuilding the eigenstate. We here introduce a second option, the Triangle algorithm, which uses the 
state-synth circuit with extra controls to reduce the unitary to upper triangular form. Recursive counts of the num- 
ber of control boxes show that it is also asymptotically optimal (Cf. 1141 .') Although these algorithms are unlikely 
to be used to implement general unitary matrices, they can be usefully applied to improving subblocks of larger 
circuits (peephole optimization). 

Finally, this work also addresses two further topics in which qudit circuits lag behind qubit circuits. First, to 
date the smallest gate library for exact universality with qudits uses arbitrary locals complemented by a continuous 
one parameter two-qudit gate 1 3 ] . In contrast, it is well known [ 6 1 that any computation on qubits can be realized 
using gates from the library {f/(2)®' ! ,CN0T}. We prove that the library {U(d)® n , CINC}, where CINC is the qudit 
generalization of the CNDT gate, is exactly universal. Second, the first asymptotically optimal qubit quantum 
circuit exploited a single ancilla qubit |14j and current constructions require none l2l llll . while qudit diagrams 
tend to suppose [~(n — 2)/(d — 2)] ancilla qudits. Here we present methods which realize a ^-controlled operation 
Ajfc(V) = I^k+\_ci®V in 0[{k + 2) 2+Xo&2l, \ gates without the need for any ancilla. This makes all qudit asymptotics 
competitive with their qubit counterparts. However, it is not known whether the Cosine-Sine Decomposition 
(CSD) is useful for building qudit circuits, despite the fact that all best-practice qubit exact universal circuits 
exploit this matrix decomposition. 

The paper is organized as follows. Sj3]improves on earlier constructions of Ai (V) gates, which are ubiquitous 
in later sections, ^presents a new circuit for a qudit Afc(V) gate which are later used to produce the first 0{d 2n ) 
gate unitary circuits without ancilla. ^5] details the recent state synthesis algorithm as an iteration over a new 
^-sequence and exploits the new constructions to prove it is correct. Sj6]presents a new asymptotically optimal 
unitary circuit inspired by the QR matrix factorization and compares it with a previous algorithm based on spectral 
decomposition. f7]discusses two applications of the state synthesis algorithm 

2 Notation and conventions 

The Hilbert spaces 2{(l,d) and 9{{n,d) are defined in the introduction. On M (l,d), the inverter for bits has two 
important generalizations for dits: 



We use the latter symbol rather than the more typical X since this operation is a modular increment. This leads to 
two generalizations of the quantum controlled-not, Ai (a x © h-i) and Ai (INC) = CINC. The usual symbol for a 
controlled-not when appearing in a qudit circuit diagram refers to CINC. Controls represented by a black bubble 
in qudit circuit diagrams fire on control state \d— 1). 

As new notation, the ^-sequence is introduced in £15 . 21 This plays a role analogous to the Gray code in earlier 
d = 2 constructions and is a particular sequence of words of n-letters. Although these words might themselves be 
called sequences, we prefer to call an individual word (e.g. 1 lOO^Mk) a term and reserve "sequence" exclusively 
for the ^-sequence of (d" — 1 )/ (d — 1 ) terms. 

3 Optimizing singly-controlled one-qudit unitaries 

Several operators Ai(V) appear in later circuits. Thus, it is worthwhile to optimize this computation in our gate 
libraries. For qubits, CNOT-optimal circuits for Ai (V) are known 1121 . The qudit case is open. Here we improve 
the Ai (V) circuit in that work and further prove for the first time that U {d) m U {CINC,CINC -1 } is exact-universal. 




(2) 



INC |j) 



|(j + l)mod d) 
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Since CINC -1 = CINC'' -1 , this also demonstrates that U(d) m U {CINC} is exact-universal. This is a smaller 
universal library than that presented in earlier work |3|. 

Thus, consider the question of factoring Ai(V). Let {|Y/t)}!t=o ^ e tne eigenkets of V with eigenvalues 
{e i6 *}fZQ. Let Wk be some one-qudit unitary with Wk |0) = |\|/*), e.g. the appropriate one-qudit Householder re- 
flection (See !15.1I ) Finally, let be a controlled one-qudit phase unitary given by <£>£ = Ai [I4 + (e ,e * — 1) |0) (0|]. 
Then note that V = l\tZo w k[hi + (e ie * — 1) |0> (0\]w£. Thus Ai (V) can be implemented by the following circuit: 



- V - — WA - 



-w - w; - - Wi 



d-i 



(3) 



- W d -i - 



Thus, we have reduced the question to building <t>^ in terms of U (d)® n and CINC. 

Building <t>^ requires some preliminary remarks. Suppose we have i; € C, \^\ = 1. Consider the diagonal 
unitary of the corresponding geometric sequence: D = YfjZoQ \j) 01- Recall that INC is the increment per- 
mutation, i.e. INC\j) = 1) modd). Thus permuting the diagonal entries, INC D INC -1 = ^ /-1 |0)(0| + 
If'' 



" \ V ' ./ 01- Hence 



INC D INC -1 D 1 



d-l 



= ^- i \q)(q\h- 1 l\j)u\ = r'«(^io) 

Now generalizing a standard trick from qubits, note further that 

ai(&) = (£U) (j\+$\d-i)(d-i\ 



■Eii> or 

7=1 



>Id, 



(4) 



(5) 



so that a controlled global-phase is in fact a local operation. Hence taking i; = e'^^, 
expression for (f^. of Equation[3]in terms of CINC and CINC -1 : 



we obtain in particular an 



4> t = Ai (Zja) CINC (7 d <g> D) CINCT 

= [(l^o 2 li)01+^M-i}^- 



1 {Id®D- 1 ) 

l|)(g>irf] CINC (/ d g)D) CINC" 



{I d ®D~ l ). 



(6) 



Hence, Ai (V) may be realized using gates from U(d)® n along with copies of CINC and c/ copies of CINC -1 . 

Recall that these circuits may be expanded into circuits in terms of h\{<5 x ®h-2}- Indeed, when viewed as 
permutations, INC and INC -1 factor into d flips. To see this, consider 0<j<k<d— 1 and let (jk) denote the 
flip permutation j ' ^ k of {0, 1, . . . ,d — 1}. Then 



INC = (01)o(12) 



<(d-2d-\) 



(7) 



Since A\[(jk)] is equivalent to Ai(o x ©/ f /_2) up to permutations within U(d) m , we see that CINC and CINC -1 
may be implemented using d — 1 copies of the controlled-flip. Thus, A 1 (V) may also be realized using 2d(d — 1 ) 
copies of the Ai (a x ©/d-2) gate. 

Remark: Note that the controlled-flip is also equivalent to AiQd-2 ® Gz), making blockwise use of the 2x2 
matrix identity Ho x H = a- for H = -K= £j A . =0 (— l) 7 ^ \k) Thus, the above also realizes Ai (V) in roughly 2d 2 

controlled-7t phase gates. This is half the roughly Ad 2 gates of earlier work (3], even after including the arbitrary 
relative phase e' e allowed there. 



4 Qudit control without ancillas 

In this section we simulate a A„_i (V) gate for V € U(d) using 0[{n + l) lo S2 d + 2 ] singly-controlled one qudit gates 
without ancilla. The method parallels the techniques used in Ref. [1 1 for universal computation with qubits. 

First we decompose a A n -i(F) gate using a sequence of gates with a smaller number of controls. As a first 
step, notice that 

An-l{V) = A n _2(^-l)[A„- 2 (lNC)Ai(xJ_ 1 )f- 1 A„_ 2 (lNC)A 1 (^ / r 1 1 ), (8) 
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where X n -\ = V l ' d . For example, for n ~ 7, we have the following circuit: 























— \ 


7 



































r 


a . r 


■s . r 


5— 




















v d-l 
A n-1 




A n-l 




A n-\ 



































— — 



(9) 



All control operations are conditioned on the control qudits being in state \d — 1). The circuit is designed to cycle 
over each possible dit value of the control qudit in the Ai(X„_i) gates. The entire construction then follows by 
recursive application of Equation [8] to the last gate. In theory, this construction is an exact implementation of 
An-i(V). Yet in practice, the sequence of matrices Xj obtained by taking the d-th root of Xj+i (with X n = V) 
quickly converges to the identity matrix as j decreases. Hence, an approximate implementation results if the 
recursion is terminated early. 

As an example of Equation^ consider the generalized Toffoli gate A2 (INC). This breaks into (d + i) variants 
of singly-controlled Ai(W) gates along with d extra CINC gates. Hence (d + l)d + d CINC gates along with 
(d+l)d CINC -1 gates and sundry gates from U{d) m suffice to emulate A2(INC). 

Note that the size of the circuit for A„_2(INC) that is analogous to the above grows exponentially in n. How- 
ever, it is possible to simulate A„_2(INC) more efficiently using a sequence of f\u n -\yz\ (INC) and /\u n -i)[2\ (INC) 
gates, proceeding recursively down to A2(INC). The argument is analogous to that used for qubits in Lemma 7.3 
in Ref. 1 1 1 for n > 5. The following circuit illustrates the method for n = 7: 
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Ignoring which qudits are controlled or targeted, the circuit sequence is A„_2(INC) = [A|_(„_i)/2J (INC) 
A r( „-i)/21 (lNC)] d . 

For the remainder of this section, we use a tilde to distinguish a count for CINC -1 from a CINC count. Thus, 
we let b„-2 be the total number of CINC gates required to emulate A„_2(INC), and b n -i be the similar count for 
CINC -1 . For CircuitfTQl 

bn-% = rf(*f(n-l)/21 +^L(«-i)/2j)' 
K-2 = rf(fe[(„_l)/2] +&[(n-l)/2j)- 



(ID 



A quick induction shows that each sequence is increasing, and thus b n -i < 2db^„_iy2] an d bn-2 < 2dfc|-( n _ij/2]. 
Moreover, by the analysis of A2(INC) above Z?2 = d 2 + 2d and ^2 = d 2 +d. Recalling (\og d n)(\og 2 d) = log 2 n, 
we obtain the following: 



K-2 < (d 2 + 2d)(2d)(2d) Xo ^ n = (d 2 +2d)(2dy +l °Z2 d i 
bn-2 < (d 2 + d)(2d)(2d) l0 ^ n = (d 2 +d)(2d)n 1+lo & d . 



(12) 



Note that these counts assume that the emulation of A„-2(INC) is done on a system with n qudits. Combining this 
circuit with Circuit[9]allows for an ancilla-free implementation of A„-i(V). 

Thus, let c n -\ be the number of CINC gates required to emulate A„_i(V), not counting an additional c„_i 
CINC -1 gates. Using Circuit^ 

db„-2 + c„-2 + d 2 , 

j2 ( Li > 



C n -1 
Cn-l 



= db n - 2 + c„-2 + d z 
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We may then overestimate c n -\ and c„_i using integral comparison and 02 = d 2 + 2d, c? = d 2 + d, obtaining 



c n -i 



d(L n jZlbj)+C2 + (n-3)d 2 



< d[{d 2 +2d)(2d)]j A 



^ +1 t 1+l0 ^dt + 2d+{n-2)d 2 



(14) 



(2d 2 )(d 2 +2d) 
2+log 2 f/ 



[(n + 1 ) 2+Xo ^ d - 4-d 2 ] + (n - 2)t/ 2 + 2d. 



We may similarly overestimate c„_ i : 



(2d 2 )(d 2 +d) 
2 + log 2 <f 



[0 



)i + \) 2+Xo ^- d -Ad 2 ]+{n-2)d 2 + d. 



(15) 



Hence c„_i, c n _i are both bounded by 0[(n + l) 2+lo S2 d ]. This can be used to show that the earlier spectral 
algorithm (4) is asymptotically optimal even when ancilla qudits are absent. 

If we disallow CINC -1 and rather emulate CINC -1 = CINC* -1 , then the overall CINC count for A„_i(V) would 
be c„-\ + (d — l)c„_i. Note that if the gate library contains the two qudit gate Ai (a x ®Id-2) rather than CINC, a 
naive application of the above argument would imply a linear overhead with a factor of d — 1 . However Circuits[9] 
and[lO| can be adapted by replacing the Ajfc(lNC) gates with gates locally equivalent to Ai (o x ®Id-z)< resulting in 
a smaller overhead. 

5 Asymptotically optimal qudit state synthesis 

State-synthesis is an important problem in quantum circuit design |Rfl|9|. This section expands upon the earlier 
account of an asymptotically optimal state synthesis circuit for qudits. The earlier circuit used only 0(d") 
two-qudit gates, while a dimension-based argument 1 4 1 shows that no fewer (£l(d n )) gates may achieve qudit state 
synthesis. There are two extensions in the present account: 

• We introduce the ^-sequence, a combinatorial gadget that organizes the order in which amplitudes are 
zeroed while (de)constructing the target state. 

• Using the ^-sequence, we prove that the state synthesis algorithm functions as asserted. 

The two-qudit gates are in fact all Ai (V) for V a one-qudit Householder reflection. Hence, earlier sections of the 
present work further improve the previous circuit. 

Recall from the introduction that we prefer to build W with W \ \|/} = |0) rather than building U with U 1 0) = | \|/} . 
We do this by constructing a sequence of factors which introduce more zeros into the partially zeroed state. The 
ordering established here by the ^-sequence may be replaced by Gray code ordering 1141 in the case d = 2. 

5.1 One-qudit Householder reflections 

Earlier universal d — 2 circuits [1;] relied on a QR factorization to write any unitary U as a product of Givens 
rotations, realized in the circuit as ^-controlled unitaries 0- In the multi-level case, we instead use Householder 
reflections §5-1]. Thus, suppose |\|/) G f{(l,d), perhaps not normalized. Householder reflections solve the 
one-qudit case of the inverse state-synthesis problem. Suppose 



Then W |\|/) is a multiple of |0) . Geometrically, W is that unitary matrix which reflects across a plane lying between 




(16) 



|0) and |y>. 
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5.2 Inserting zeroes using Householders in ^-sequence order 



The «-qudit techniques require a bit more notation. Any term of the ^-sequence describes a particular instantiation 
of a Ak(V) gate, controlled on certain lines determined by the letters with target determined by the first Jft. We 
next expand the controlled operator notation so as to precisely describe how to extract a control from such a term. 

Definition 5.1 "ffjj [Controlled one-qudit operator A(C,V)] Let V be a d x d unitary matrix, i.e. a one-qudit 
operator. Let C = [C\C 2 ■ ■ ■ C n ] be a length-n control word composed of letters from the alphabet {0, 1 , . . . ,d — 
1 } U { * } LI { T}, with exactly one letter in the word being T. By #C we mean the number of letters in the word with 
numeric values ( i.e., the number of controls.) The set of control qudits is the corresponding subset of {1,2,..., n} 
denoting the positions of numeric values in the word. A control word matches an n-dit string if each numeric 
value matches. Then the controlled one-qudit operator A(C, V) is the n-qudit operator that applies V to the qudit 
specified by the position ofT iff the control word matches the data state's n-dit string. More precisely, in the case 
when C n = T, then 



A([C l C 2 ...C n - 1 T],V)\c l c 2 . 



a. 



®V\c„ 

• c n—\ c n) 



CjorCj = *, 1 <j<n-\ 



otherwise 



(17) 



Alternatively, ifCj =T (j <n,)we consider the unitary (permutation) operator %" that swaps qudits j andn. Thus, 
X" \d\d2 ■ ■ .d„) = \d1d2 ■ ■ .dj-\d n di+ \ . . .d n -\dj}. Control on a wordC = [C1C2 ■ ■ .Cj-\TCj+\ . . .C n ], is then given 
by A(C, V) = % n j A (C, V)%"j for C=[C 1 C 2 ... Cj- , C„C j+ , . . . C„_ 1 T}. 

In our particular state synthesis algorithm, we can factor W so that nf=i A[C(p —k+l),V(p — k+l)] \\\t) = 
|0) with all #C(k) < 1 and p = (d" — l)/(d — 1). Since each #C(k) < 1, each controlled operation is in fact a 
two-qudit gate. The circuit layout depends on the ^-sequence, defined in Algorithm 1 and illustrated in Tabled 



n 


Jfc-sequence, d = 3 


1 


A 


2 


OA, 1*. 2*. XX 


3 


00*. 01*. 02*, 0**, 10*. 11*, 12*. 1**, 20*, 21*, 22*. 2**, *** 


4 


000*, 001*. 002*, 00**, 010*, 01 1*. 012*. 01**, 020*. 021*. 022*, 02**, 0*** 
100*, 101* 102* 10** 110*, 111* 112* 11** 120* 121* 122*, 12** 1*** 
200*, 201*, 202*, 20**, 210*, 211*, 212*, 21**, 220*, 221*, 222*, 22**, 2***, **** 



Table 1: Sample ^-sequences for d = 3, i.e. qutrits. 



Algorithm 1: {si,. . .,s p } = Make-J|k-sequence(//,n) 

% We return a sequence of p = (d" — l)/(d — 1) terms, with n letters each, 
% drawn from the alphabet {0, 1, ... ,d — I, Jit}. 
Let {sj} p j=l = Make-Jfc-sequence (d,n - 1). 
for q = 0, l,...,d— 1 do 

The next (d n ~ — l)/(d— 1) terms of the sequence are formed by prefixing the letter q to each 

term of the sequence {sj}. 
end for 

The final term of the sequence is Jit". 



The number of elements in the sequence, (d" — l)/(d— 1), equals the number of uncontrolled or singly- 
controlled one-qudit operators in our state-synthesis circuit. To produce the circuit, it suffices to describe how to 
extract the control word C from a term t of the ^-sequence and how to determine V from the term and where 
|v|/y) = FIj[=i A[C(p — k+ 1), V(p — k+l)] |\|/) is the partial product, as shown in the following algorithm. 
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Algorithm 2: A(C, V) = Single-^Householder (A term f = n-qudit state 



Initialize C = **••• * 
% Sef f/te target: 

Let ^ be the index of the leftmost X and set Q = 7\ 

% Sef a single control if needed: 

if f contains numeric values greater than 0, 

Let q be the index of the rightmost such value and set C q = t q . 
end if 

Given = T£Sq 1 (k\\fj) \k), form a one-qudit state |<p) = Y^Zq (hh ■ ■ -^-i&00. . .0|\|/y) \k). 
Form V as a one-qudit Householder such that V |<p) = |0). 



Figure^displays the type of gate produced from the output C and V from the algorithm Single-JjkHouseholder. 
Figure |2 illustrates the order in which these A(C, V) reflections are generated if we iterate over the ^-sequence. 
Each node of the tree is labeled by a Jfc-term and represents a Householder reflection defined by three elements of 
\\\t), whose indices are indicated in the node. The reflection zeroes all but the the first of these three elements. The 
reflections are applied by traversing the graph in depth-first order, left to right. 



2 
1 




* 

* 
* 











- \ 


7 — 



Line 1 
Line 2 
Line 3 
Line 4 

Line 5 

Line 6 
Line 7 



Figure 1: Producing a A(C, V) given V and a term of the ^-sequence, here t = 2100JMW|k for seven qudits. The 
algorithm for producing C places the V-target symbol T on the leftmost club, here line 5. The active control must 
then be placed on the least significant line carrying a nonzero prior to line 5, here the 1 on line 2. (A control on lines 
3 or 4 would not prevent the nonzero oco of |\|/ ; ) = Y^k=0 ak 1^) fr° m creating new nonzero entries in previously 
zeroed positions.) Thus in this case, C = *1 * *T* *, The V is chosen to zero all but one oc^ for k = 210(M)0. 



5.3 Householder circuits for state synthesis 

We will make use of state synthesis for |\|/) ^ -J (vj/j \|/) |0) but also for |\|/) i— > <J \|/) \m) for any m = d\d2 ■ ■ ■ d„. 
We adapt our construction for a collapse onto |0) into an algorithm for collapse onto \m). The idea is to permute 
the elements to put m in position 0, apply a Single-JjkHouseholder sequence, and then permute back. 

Let m = d\d2dj, . . .d n be a rf-ary expansion of some m, < m < d n — 1. Then \m) = g)^ =1 INCr* |0). Further, 
for a generic control word C, define a new m-dependent control word C by 

{*, Q = * 

T, C k = T (18) 

(C k +d k )modd, C k e {0, 1, . . . ,d - 1} 

Suppose also that C,„ = T. Then noting that (©m) T = ®{d — m), we have the similarity relation 

[^ =1 INC*]A(C,V)[^ =1 INC c/ - £/ *] = A[C,(®d m )V(®d-d m )}. (19) 

This is the basis for the algorithm for state synthesis. 
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2 1 1 
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1 1 2 




1 2 2 




2 2 
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Figure 2: Using the A-sequence for d = 3, n = 3 to generate Householder reflections to reduce |\|/) to a multiple 
of |0). Each node is labeled by a A-term and represents a Householder reflection A(C, V). The control is indicated 
by the boldface entry in the label. As the tree is traversed in a depth-first search, each node indicates a A(C,V) 
that zeroes the components of the last two indices in each node using the component of the top entry. See also 
Figure 1 of l4l. 

Algorithm 3: A(C, V) = AHouseholder (\\\t) ,m,d,n) 
% Reduce \\\f) onto \m). 
Let m = d\d2 ■ ■ - d„. 
Compute | (p) = ((g)" jINC c/ - rf 9) \\\t). 

Produce a sequence of controlled one-qudit operators so that 

ULi^c( P -k+i),v(p-k+i)M = \oo...o), 

using Single-AHouseholder applied to each term of Make-A-sequence(c/,n). 

Compute (® n q=l m&<i)/\[C{p-k+l),V{p-k+l)](® n p=l INC d - d >i) = 
A [C (p - k + 1 ) , V (p - k + 1 ) ] using Equationll^l 

AHouseholder applies the sequence of Householder reflections generated by Single-AHouseholder. The 

resulting unitary W, although not a Householder reflection itself, satisfies W |\|/) = \m), as we prove in the next 
subsection. Moreover, since the circuit contains 0(d") two-qudit gates, all of which are reversible, we have 
also produced an optimal gate count for the state synthesis problem. Indeed, if we let U = W\ then we have 

U\0) = |\|/). Moreover, if we label p(n) = (d n - l)/(d- 1), then U = Uk=i A(C(Jt),V (k) f ) costs p{n) = 0{d n ) 
gates. 

We postpone applications to [JTland next prove that Algorithm 3 is correct. The proof is new and is organized 
in terms of the A-sequence. 

5.4 Proof that Jlk-Householder achieves W |\]/) = |m) 

For simplicity, we take m = 0, neglecting the permutations. Given «, p(n) = (d n — l)/(d — 1) is the number of 
elements of the A-sequence. It would suffice to prove (i) that each operator A[C(j),V(j)] guarantees d— 1 new 
zeroes in the state not guaranteed in |\|/ ; ) and (ii) moreover that A[C(j),V(j)] does not act on previously 

guaranteed zeroes. The assertion (i) is straightforward and left to the reader; see Figure |2]caption. However, the 
second assertion is false. Rather, the controlled one-qudit operators do act on previously zeroed entries, but always 
replace them with a zero result. We next make this assertion precise and prove it. 

Define the index set S = {0, 1, . . . , d n — 1 } and introduce two new sets of dit-strings: 

• 5* (j) is the set of dit-strings for which the corresponding amplitude of |\|/ ; ) is not guaranteed zero by some 
A[C(k),V(k)],k<j. 
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• S[C(j)] is the set of dit-strings that match C(j), per Definition l5.ll 

Also, define I to be the index of the target symbol in C(j): C(j)f = T. Now there is a group action of Z/dl* on 
the index set S corresponding to addition mod d on the ft 1 dit: 

cu(cic 2 ...c„ = ciC2-..Ci-i(ce + cmodd)c£ + i...c„. (20) 

Since the operator V(j) is applied to qudit I, the amplitudes (components) of either equal to the 

corresponding amplitude of |\|/y) or else are linear combinations of the [y^-amplitudes whose indices lie in the 
Z/t/Z orbit contained in S\C{j)]. To establish the correctness of ^Householder. we will prove the following 
Proposition. 

Proposition 5.2 has at least d— 1 more guaranteed zero amplitudes than ytyj)- 

Since ^Householder sets j = 1, . . . , (d" — l)/(d — 1), this means that the final has a single nonzero 
element corresponding to |0) and state synthesis has been achieved. We prove this result using three lemmas. 
First we write S*(j) as the union of the three sets Ri(j), R2U), and Ri(j) which we now define. 

Definition 5.3 Suppose the j term of the Ift-sequence is given by c\c 2 . . . q_ 1 «fc . . . We have C{j) the corre- 
sponding control word, with C(j)( = T. Consider the following three sets, noting Ri(j) may be empty. 

RiU) = U*=o{ c 1 c 2 ...c q W0---0 ; k<c q+u ke{o,\,...,d-\} 

R 2 (j) = |c 1 -- Q _ 1 feOO...O;feG{0,l,... 1 rf-l}| (21) 
R 3 (j) = \ fi ■ ■ ■ fe-\k t k t+ i ...k n ; fif 2 .. -fe-i> c\c 2 - e {0,1,..., d- 1} 



These sets may be interpreted in terms of Figure |2] Recall the figure recovers the ^-sequence by doing a 
depth-first search of the tree. In this context, S*{j) is the set of possibly nonzero components of at the 7 
node. The subset Rj, (j) results from indices that lie in nodes not yet traversed, loosely above the present node in 
the tree or to the right. The set R 2 (j) is precisely the set of indices in the current node, node j. The set R\ (j) 
is the set of indices of elements that have been previously used to zero other elements and still might remain 
nonzero themselves; it is the set of indices of elements that were always at the top of nodes already traversed in 
the depth-first search. Thus, R\ (j) is loosely a set of entries within nodes to the left and perhaps below node j. 

The first lemma, along with the third, is used to show that the algorithm does not harm previously-introduced 
zeroes. 

Lemma 5.4 Suppose the I th letter of C(j) is the target symbol T, and label S t .( j) = R\ (j) L-I-R2C/) ^^U)- Then 

(z/rfZ) * t s, (;) n S[C(j)] c & (j) ns[c(j)] . (22) 

Proof: Due to the choice of a single control on a dit to the right of position £ in the appropriate term of the 
^-sequence, R\ (j) n S[C(j)] = 0. On the other hand, a direct computation verifies that (Z/dZ) •iR 2 {j) C R 2 (j) 
and also that R 2 (j) n S[C(j)} = R 2 (j). 

Finally, we argue that (Z/dZ) •(R3(j) C Ri(j)- However, the following partition is in general nontrivial: 

Kstf) = {R3U)r)S[c(j)}}u{R 3 (j)n{s-s[c(j)})}. (23) 

Should C(j) admit no control, we are done. If not, let m < I be the control qudit. Then 



Ri {j) nS[C{j)] = |/i • • • ft-iktkt +1 . . . k n ; f m = c m ,/i . . .fe-i > c x c 2 ■ ■ ■ c^ u K G {0, 1, . . . ,d - 1} |. (24) 
Hence the TLjdTL action respects the partition of Eauationl23las well. □ 
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The second lemma shows that the algorithm produces d — 1 newly guaranteed zeroes at each step. 



Lemma 5.5 Let C(j), I, and 5* (J) be as above, with C(j) resulting from c\c 2 . . .C(_\Jlt . . XX of the X-sequence. 
Let Z = {ciC2...Q_ifcOO...O ; k e {1,2, . . . ,d - 1} n Z} be the elements zeroed by A (C(j),V (j)). Then Ri(j) U 
R 2 (j) UR 3 (j)=R 1 (j+l)UR 2 (j+l)UR 3 (j+l)UZ. 

Proof: We break our argument into two cases based on the value of ci-\. 

Case C£- 1 < d — 1: The (j + 1 ) st term of the ^-sequence is is given by c\c 2 . . . (q_ i + 1 )00 . . . OX. Note that for 
leaves of the tree, the buffering sequence of zeroes is vacuous. 



fliO' + l) = Ri(j)UR 2 (j)-z, 

R 2 (j+l)UR 3 (j+l) = R 3 (j). 



(25) 



Hence flj (j) U R 2 (j) U R 3 (j) =R 1 (j+l)UR 2 (j+l)UR 3 (j + l)UZ. 

Case q_i =d—l: Suppose instead the j^ 1 ^-sequence term is c\c 2 . . .C(_ 2 (d — so that the (j + l) st 

term is c\c 2 . . . q_ 2 *** . . . We note that {c ci . . . cc- 2 (d - 1)0 . . .0} G R 2 (j) F\R 2 (j + 1).* Then 

RlU) = /?i(j+l)U/? 2 , + l)-{coc 1 ...Q_ 2 (d-l)0...0}, 

R 2 (j) = ZU{c c l ...c e - 2 (d-l)0...0}, (26) 

«3 0') = *3C/'+l)- 

From the first two, flj ( j) U^U) = «i (j + 1) U/? 2 (j'+ 1) U Z . Hence /? x (;) U R 2 (j) U /? 3 (j) =R l (j + l)UR 2 (j + 
l)UR 3 (j + l)UZ. □ 
The third lemma shows that the set we considered in Lemma l5~4l is indeed the set of guaranteed zeros. 

Lemma 5.6 = Ri(j) UR 2 (j) UR^(j) is the set of guaranteed zero amplitudes (components) of a generic 

Proof: The proof is by induction. For j = 1, we have 

/?i(l)=0, Z? 2 (l) = {00. ..0*}, J? 3 (l) = {cic 2 ...c„_i*; somec^- >0}. (27) 

Hence the entire index set S = S* (1) = Ri (1) U R 2 (l) U 7? 3 (1) . 

Hence, we suppose by way of induction that S*(j) = Ri(j) \JR 2 (j) UR 3 (j) and attempt to prove the similar 
statement for j + l. Now A[C(j),V(j)] will add new zeroes to the amplitudes (components) with indices Z by 
Lemma IS31 On the other hand, /\[C(j),V(j)} will not destroy any zero amplitudes existing in S*(j) due to the 
induction hypothesis and LemmaEU Thus S* (j + 1 ) = i?i (J + 1) U R 2 (j + 1) U R 3 (j + 1) . □ 

Proof of 15 .21 The main result now follows after combining our three lemmas. □ 



6 Unitary synthesis by reduction to triangular form 

In this section, we present an asymptotically optimal unitary circuit not found in 0. It leans heavily on the 
optimal state-synthesis of ^Householder. Since this state-synthesis circuit can likewise clear any length d" vector 
using fewer than d" single controls, the asymptotic is perhaps unsurprising. Yet the unitary circuit requires highly- 
controlled one-qudit unitary operators when clearing entries near the diagonal. Optimality persists since these are 
used sparingly. Two themes should be made clear at the outset: 

• We process the size d n x d" unitary V in subblocks of size d"^ 1 x d"^ 1 . 

• Due to rank considerations, at least one block in each block-column of size d" x d"~ 1 must remain full rank 
throughout. 

*So in the application, the amplitude (component) of this index is the single amplitude not zeroed by A[C(j),V (j)], but it is immediately 
afterwards zeroed by A[C(j + l),V(j+ 1)]. 
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Hence, we cannot carelessly zero subcolumns. One solution is to triangularize the d n x d n matrices on the 
block diagonal, recursively. Given that strategy, the counts below show only 0{n 2 d") fully (« — 1) controlled 
one-qudit operations appear in the algorithm. This is allowed when working towards an asymptotic of 0{d 2 ") 
gates. 

The organization for the algorithm is then as follows. Processing (triangularization) of V moves along block- 
columns of size d n x d n ~ l from left to right. In each block-column, we first triangularize the block d"^ 1 x d"^ 1 
block-diagonal element, perhaps adding a control on the most significant qudit to a circuit produced by recursive 
triangularization. After this recursion, we zero the blocks below the block-diagonal element one column at a time. 
For each column j, < j < d"~ l — 1, the zeroing process is to collapse the d"^ 1 x 1 subcolumns onto their j 1 * 1 
entries, again adding a control on the most significant qudit to prevent destroying earlier work. These subcolumn 
collapses produce the bulk of the zeroes and are done using Jfc Householder. After this, fewer than d entries remain 
to be zeroed in the column below the diagonal. These are eliminated using a controlled reflection containing n — 1 
controls and targeting the top line. 

We now give a formal statement of the algorithm. We emphasize the addition of controls when previously 
generated circuits are incorporated into the universal circuit (i.e. recursively telescoping control.) 

Algorithm 4: Triangle (U,d,n) 
if n = 1 then 

Triangularize U using a QR reduction, 
else 

Reduce top-left d"~ l x d"~ l subblock using Triangle (*,<i,H — 1), (writing output to bottom 

n — 1 circuit lines) 
for m = 0, 1, . . . ,d — 1 do % Block-column iteration 
for columns j = md"^ 1 ,. .. ,[(m+ 1 1 — 1] do 
for £ = (m + 1), . . . , (d — 1) do % Block-row iterate 

Use ^Householder to zero the column entries (m + £)d"~ l , . . . , [(m + £ + l)d"~ l — 1], 
leaving a nonzero entry at (m + £)ci ...c„ for j = c\c%... c n and 
adding \m + £) - control on the most significant qudit. 
end for 

Clear the remaining nonzero entries below diagonal using one /\(Tc2 ■ ■ .c„,V). 
end for % All subdiagonal entries zero in block-col 

Use Triangle(*, d,n — 1) on the d"~ l x d"~ l matrix at the (m + l) st block diagonal 
adding \m +1)- control to the most significant qudit. 
end for 
end if-else 



To generate a circuit for a unitary operator U, we use Triangle to reduce U to a diagonal operator W = 
^JLq 1 e'^j \j) Now V and U = WV would be indistinguishable if a von Neumann measurement {\j) (jWj^ 1 
were made after each computation. However, the diagonal is important if U is a computation corresponding to 
a subblock of the circuit of a larger computation with other trailing, entangling interactions. In this case, the 
diagonal unitary can be simulated with d" A„_i (V) gates. Writing j in its d-ary expansion, j = joj\ . . .j n -\ we 
have W = Uf^o ®Li INC f A «-i (e'^ l</_1> ^ -11 ) ®" k=1 INCJ A . By the argument in g| the gate count for such a 
simulation is 0[d n (n — \) 2+lo &2 d ]. This is asymptotically irrelevant compared to the lower bound. 



6.1 Counting gates and controls 

Let h(n,k) be the number of ^-controls required in the Single-Jfc Householder reduction of some |\|/) e 9{(n,d). 
Then clearly h(n,k) = for k>2. Moreover, each 0-control results from an element of the ^-sequence of the 
form 00 . . . OJMk . . . Jft, and there are « such sequences. Thus, since the number of elements of the ^-sequence is 

(d" - l)/(d- 1), we see that 

h(n,l) = (d"-l)/(d-l)-n 

h(n,0) = n K ' 
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We next count controls in the matrix algorithm Triangle. We break the count into two pieces: g for the work 
outside the main diagonal blocks and / for the total work. 

Let g(n,k) be the number of ^-controls applied in operations in each column that zero the matrix below the 
block diagonal; this is the total work in the for j loops of Triangle. We use Single-«M Iouseliokler d(d—l )d"~ 1 /2 
times since there are d(d — 1 )/2 blocks of size d"~ 1 x d"~ 1 below the block diagonal, and we add a single control 
to those counted in h. The last statement in the loop is executed d" — d"~ l times. Therefore, letting 8* be the 
Kronecker delta, the counts are 



(»,*) = b n l T 1 {d n -d n - x ) + -d{d-l)d n - l h(n-l,k-l) (29) 



1 

2 

Supposing n > 3, then we see that 

d"-d"-\ k = n-l 
0, n - 1 < k < 3 



>(n,k) = < 



\d n {d n - l -\)-±d n {d-l){n-\), k = 2 (30) 

\d n {d-l){n-\), k=\ 
0, k = 



Finally, let f(n,k) be the total number of ^-controlled operations in the Triangle reduction, including the block 
diagonals. This work includes that counted in g, plus a recursive call to Triangle before the for m loop, plus 
[d — 1) calls within the k loop, for a total of 

f(n,k) = g(n,k)+f(n-l,k) + {d-\)f(n- (31) 

with/(rt,0) = 1 and/(l,ifc) = 0for«,/fc>0. 

Using the recursive relation of Equationf^and the counts of Equation[30] we next argue that Triangle has no 
more than 0{d 2n ) controls. The following lemma is helpful. 

Lemma 6.1 For sufficiently large n, we have f(n,k) < d 2n ~ k+4 . 

Proof: By inspection of EquationOOl we see that g(n,k) < (1 /2)d 2n ~ k+2 for all k and n large. Now /(n,0) = 1, 
which we take as an inductive hypothesis while supposing fin — l,€) < d 2 "~ 2 ~ e+4 = d 2 "~ l:+2 . Thus, using the 
recursion relation of EauationOll 



f(n,k) < y 2 "- k+2 + d 2n - k+2 +{d-l)d 2 "- k+3 

= d 2 "- k+4 ( X + 4 + 



j" + i+i-i r (32) 

Now since d > 3/2, we must have ^ > X, whence an inductive proof of the result. □ 

By the results from S0] each ^-controlled single-qudit unitary operator costs q = 0[(k + 2) 2+log2 ^] CINC and 
CINC -1 gates without ancillas. The expected number of CINC gates ij for the algorithm Triangle is then given 
by the weighted sum for the A:-control gates in the diagonalization and the d" instances of n — 1 -controlled phase 
gates for emulation of the diagonal: 



it = 



d n c n ^+r k ZoCkf(n,k) 



< 2(» + l) 2 + lo &W rf »+4 + £/ 8+2„ I »-l £/ -A /t 2+riog 2 ,/l 

< 2(„ + l) 2 + 1 ^W ( i«+ 4 + £ / 8 + 2 "Li_ (2+riog ^ /1) (l/ £ /) (33) 

< 2(n+ l) 2+l °S2( d )d n+4 + 26d &+2 ". 

In the third line we have used the fact that for the Polylogarithm function, Li_(2+[iog 2 d])(l/^) — Li_3 (1 /2) = 26. 

6.2 Comparison with the spectral algorithm 

In an earlier work 1 4 1, we described an different algorithm for unitary synthesis. That algorithm relied on a spectral 
decomposition of the unitary and was also shown to be asymptotically optimal. For a circuit without ancillas, the 
CINC gate count 1$ using the spectral algorithm is: 

is < 2d n+l [(d" -\)/{d-l)-n] + {n+ i) 2 + lo S2 V+ 4 (34) 
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In Table|3]the exact gate counts resulting from our implementations for unitary synthesis using Triangle and the 
spectral algorithm are tabulated. The result is that for a system with no ancillary resources, the spectral algorithm 
outperforms Triangle when the number of qudits n is greater than two. The general d 2n scaling for both is shown 
in Figure |3 



d 

n 


2 


3 


4 


5 


6 


7 


8 


9 


10 


2 


18 
18 


78 
78 


220 
220 


495 
495 


996 
996 


1 708 
1708 


2 808 
2 808 


4 365 
4 365 


6 490 
6 490 


3 


192 
154 


2 025 
1 944 


10 752 
10 496 


39 375 
38 750 


1 14 048 
112 752 


280 917 
278 516 


614 400 
610 304 


1 226 907 
1 220 346 


2 280 000 
2 270 000 


4 


1 152 
1 056 


23 085 
22 113 


200 704 
195 584 


1 096 875 
1 078 125 


4 447 872 
4 393 440 


14 638 897 
14 504 441 


41 287 680 
40 992 768 


103 394 799 
102 804 309 


235 600 000 
234 500 000 


5 


5 504 
4 928 


223 074 
211 410 


3 317 760 
3 215 360 


27 875 000 
27 312 500 


161 523 072 
159 236 928 


720 717 774 
713 188 238 


2 649 227 264 
2 627 993 600 


8 386 138 980 
8 332 994 880 


23 574 000 000 
23 453 000 000 


6 


23 296 
21 120 


1 931 121 
1 856 763 


50 003 968 
49 070 080 














7 


92 672 
84 224 


16 605 891 
16 087 572 
















8 


353 280 
324 096 


141 599 502 
138 627 369 
















9 


1 333 248 
1 246 208 


1 224 144 819 
1 209 914 010 
















10 


5 025 792 
4 786 176 


10 741 839 786 
10 680 015 483 
















11 


19 128 320 
18 452 480 


95 432 986 134 
95 147 070 876 
















12 


73 515 008 
71 639 040 



















Table 2: Exact gate counts for unitary synthesis without ancillas as a function of the number, n, and dimension, 
d, of the qudits. Each cell of the table lists the count for CINC and CINC -1 gates using the most efficient of the 
two algorithms presented in the text. Boldface entries indicate that the Triangle algorithm was the most efficient, 
normal face type corresponds to counts using the sprectal algorithm. 

There are situations where Triangle may be preferred over the spectral algorithm. The later requires a classical 
diagonalization of the unitary U which requires 0(d 3 ") steps. For matrices of large size, particularly when there 
are degenerate eigenstates, numerical stability can be an issue. The classical computations involved in Triangle 
also scale like 0{d in ) but are carried out directly in the logical basis of the qudits. 

7 Two applications of state synthesis 

A primary motivation for describing state synthesis circuits is to utilize them as subcircuits for unitary synthesis 
as in Sj6] Yet there are also independent applications for the state-synth algorithm. We present two such. 

7.1 Computing expected values 

First, consider the problem of computing the expectation value of a Hermitian operator A € 9{{n,d) i.e. A € 
End[.W (n,d)] = C d xd with A^ = A. For a system in the possibly mixed state p of n qudits, the the expectation 
of an operator A is (A) = Tr[Ap]. In some cases there does not exist a physically realistic direct measurement of 
A. However, one may infer the expectation value by a suitably weighted set of von Neumann measurements as 
follows. By the spectral theorem, any normal operator A may be diagonalized by a unitary transformation U : 
A = U f DU where D = q 1 I;) 01 and {Xj}f =0 l are the eigenvalues of A. Then 

(A) = Tr[Ap] = Tr[DE/pf/t] = XjTr[ \j) (j\ f/p£/+ ] . (35) 
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Figure 3: Performance comparison of the two algorithms for unitary synthesis on n = 4 qudits as a function of 
qudit dimension d. Triangles (boxes) indicate CINC gate counts for the Triangle (spectral) algorithm. 



Hence we may compute (A) by performing three steps. 

1. Prepare p. 

2. Enact the unitary evolution U on p. 

3. Perform the computational-basis von Neumann measurement on the resulting state, extracting all popula- 
tions of the basis states \j) (J\. 

In some instances one may want to know the weight of a quantum state on a subspace of the operator A, i.e. 
(PsAPg) where P$ is some projection operator onto a subspace JfjCjf (n,d). In particular, consider the case of 
a k dimensional subspace diagonal in the eigenbasis { lity)}'^ 1 of A. We wish to compute Tr£* =1 Xj \uj) (u.j\ p] 
where k < d" and the eigenvalues of A have been reordered accordingly. Then we can rewrite the projection 
PsAPs = Lj=i ^jW{uj) \ j) (j\ W(ujY where W{uj) is a unitary extension of the mapping \ j) — > \uj). The operator 
W(uj) is the unitary obtained in the state-synth algorithm. The expectation value is then 

(P S AP S ) = £ VTr[ \J) U\ W(u^pW{uj) ] . (36) 

7=1 

The expectation value can be measured as before but now one need only implement the state-synth operator k 
times on each state p of an ensemble of identically prepared states. 

The above argument may in fact be generalized to compute the expectation value of any operator A. First 
decompose the operator as A = A/, +A a with A/, = (A +A^)/2 the Hermitian part and A„ = (A — A^)/2 the 
anti-Hermitian part of A. Both A/, and A a are normal operators and therefore can be diagonalized. Hence, the 
expectation value can be computed by evaluating the weighted sum as per Eq. |35]and summing. 

7.2 The general state synthesis problem 

Both Triangle and the spectral algorithm are well adapted to the general state synthesis problem. This problem 
demands synthesizing any unitary extension of the many state mapping {\j) — > |\|/ ; ) | < j < £ <C d"} j9). It 
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is unclear what sorts of applications might arise when the states are arbitrary, requiring exponentially expensive 
circuits to build each. Nonetheless, less generic unitaries of this form have been used in quantum error correction 
to encode a few logical qudits into many physical qudits (8). 

Triangle provides one solution to this problem. Start with a matrix containing |\|/y) in its jth column, with 
"don't care" entries in columns after column I. Ignore any operations on the "don't care" entries, and discard any 
gates meant to place zeros among them. 

The spectral algorithm provides an alternative solution. Note that the matrix U formed from the product of 
the £ Householder transformations necessary to reduce the d" x £ matrix ) . . . \yt}] to diagonal form has d" — I 
eigenvalues equal to 1, so the spectral algorithm needs to build an eigenstate, apply a conditional phase to one 
logical basis ket, and unbuild the eigenstate only I times. 

8 Conclusions 

This work concerns asymptotically optimal quantum circuits for qudits. By asymptotically optimal, we mean that 
the circuits require 0{d n ) gates of (no more than) two qudits for constructing arbitary states and 0(d 2 ") gates for 
unitary evolutions. Contributions of this work are the following: 

• We provide the first argument that both asymptotics survive even when no ancilla (helper) qudits are allowed. 

• We present the state synthesis circuit in much more detail than previously published, in particular describing 
it in terms of iterates over a ^-sequence which plays a role similar to Gray codes for bits. Using the X 
sequence, we provide the first proof that the state synthesis circuits actually achieve U |0) = \\\t). 

• We present Triangle, a new asymptotically optimal quantum circuit for qudit unitaries which is inspired 
by QR matrix factorization. Since it leans more heavily on QR than on spectral decomposition, the gate 
parameters of Triangle require less classical pre-processing than the spectral algorithm. Moreover, Triangle 
more closely resembles earlier quantum circuit design techniques ffl ll4l than other asymptotically optimal 
qudit unitary circuits. 

• ^provides an elementary proof that {CINC} U U{d)® n is exact-univeral for qudits. 

Some open questions remain. The Ai (V) gates are much better than earlier practice but not provably optimal, 
as is the case with qubits 1121 . Moreover, the current best-practice n-qubit circuits exploit the cosine-sine decom- 
position (CSD), yet technical difficulties 1 13 1 with the tensor product structure make it quite unclear whether this 
matrix decomposition is useful for qudits. 
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